59 research outputs found

    Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay

    Full text link
    Belief propagation is a fundamental message-passing algorithm for probabilistic reasoning and inference in graphical models. While it is known to be exact on trees, in most applications belief propagation is run on graphs with cycles. Understanding the behavior of "loopy" belief propagation has been a major challenge for researchers in machine learning, and positive convergence results for BP are known under strong assumptions which imply the underlying graphical model exhibits decay of correlations. We show that under a natural initialization, BP converges quickly to the global optimum of the Bethe free energy for Ising models on arbitrary graphs, as long as the Ising model is \emph{ferromagnetic} (i.e. neighbors prefer to be aligned). This holds even though such models can exhibit long range correlations and may have multiple suboptimal BP fixed points. We also show an analogous result for iterating the (naive) mean-field equations; perhaps surprisingly, both results are dimension-free in the sense that a constant number of iterations already provides a good estimate to the Bethe/mean-field free energy.Comment: 24 pages; comments welcome

    Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications

    Full text link
    Markov random fields area popular model for high-dimensional probability distributions. Over the years, many mathematical, statistical and algorithmic problems on them have been studied. Until recently, the only known algorithms for provably learning them relied on exhaustive search, correlation decay or various incoherence assumptions. Bresler gave an algorithm for learning general Ising models on bounded degree graphs. His approach was based on a structural result about mutual information in Ising models. Here we take a more conceptual approach to proving lower bounds on the mutual information through setting up an appropriate zero-sum game. Our proof generalizes well beyond Ising models, to arbitrary Markov random fields with higher order interactions. As an application, we obtain algorithms for learning Markov random fields on bounded degree graphs on nn nodes with rr-order interactions in nrn^r time and log⁑n\log n sample complexity. The sample complexity is information theoretically optimal up to the dependence on the maximum degree. The running time is nearly optimal under standard conjectures about the hardness of learning parity with noise.Comment: 25 page

    The Vertex Sample Complexity of Free Energy is Polynomial

    Full text link
    We study the following question: given a massive Markov random field on nn nodes, can a small sample from it provide a rough approximation to the free energy Fn=log⁑Zn\mathcal{F}_n = \log{Z_n}? Results in graph limit literature by Borgs, Chayes, Lov\'asz, S\'os, and Vesztergombi show that for Ising models on nn nodes and interactions of strength Θ(1/n)\Theta(1/n), an ϡ\epsilon approximation to log⁑Zn/n\log Z_n / n can be achieved by sampling a randomly induced model on 2O(1/ϡ2)2^{O(1/\epsilon^2)} nodes. We show that the sampling complexity of this problem is {\em polynomial in} 1/ϡ1/\epsilon. We further show a polynomial dependence on ϡ\epsilon cannot be avoided. Our results are very general as they apply to higher order Markov random fields. For Markov random fields of order rr, we obtain an algorithm that achieves ϡ\epsilon approximation using a number of samples polynomial in rr and 1/ϡ1/\epsilon and running time that is 2O(1/ϡ2)2^{O(1/\epsilon^2)} up to polynomial factors in rr and ϡ\epsilon. For ferromagnetic Ising models, the running time is polynomial in 1/ϡ1/\epsilon. Our results are intimately connected to recent research on the regularity lemma and property testing, where the interest is in finding which properties can tested within ϡ\epsilon error in time polynomial in 1/ϡ1/\epsilon. In particular, our proofs build on results from a recent work by Alon, de la Vega, Kannan and Karpinski, who also introduced the notion of polynomial vertex sample complexity. Another critical ingredient of the proof is an effective bound by the authors of the paper relating the variational free energy and the free energy.Comment: arXiv admin note: text overlap with arXiv:1802.06126 Updated bibliograph

    Approximating Partition Functions in Constant Time

    Full text link
    We study approximations of the partition function of dense graphical models. Partition functions of graphical models play a fundamental role is statistical physics, in statistics and in machine learning. Two of the main methods for approximating the partition function are Markov Chain Monte Carlo and Variational Methods. An impressive body of work in mathematics, physics and theoretical computer science provides conditions under which Markov Chain Monte Carlo methods converge in polynomial time. These methods often lead to polynomial time approximation algorithms for the partition function in cases where the underlying model exhibits correlation decay. There are very few theoretical guarantees for the performance of variational methods. One exception is recent results by Risteski (2016) who considered dense graphical models and showed that using variational methods, it is possible to find an O(Ο΅n)O(\epsilon n) additive approximation to the log partition function in time nO(1/Ο΅2)n^{O(1/\epsilon^2)} even in a regime where correlation decay does not hold. We show that under essentially the same conditions, an O(Ο΅n)O(\epsilon n) additive approximation of the log partition function can be found in constant time, independent of nn. In particular, our results cover dense Ising and Potts models as well as dense graphical models with kk-wise interaction. They also apply for low threshold rank models.Comment: This preprint is completely subsumed by preprints arXiv:1802.06126 and arXiv:1802.06129 by the same authors which also include important references that are missing in the current preprin

    Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective

    Full text link
    The free energy is a key quantity of interest in Ising models, but unfortunately, computing it in general is computationally intractable. Two popular (variational) approximation schemes for estimating the free energy of general Ising models (in particular, even in regimes where correlation decay does not hold) are: (i) the mean-field approximation with roots in statistical physics, which estimates the free energy from below, and (ii) hierarchies of convex relaxations with roots in theoretical computer science, which estimate the free energy from above. We show, surprisingly, that the tight regime for both methods to compute the free energy to leading order is identical. More precisely, we show that the mean-field approximation is within O((nβˆ₯Jβˆ₯F)2/3)O((n\|J\|_{F})^{2/3}) of the free energy, where βˆ₯Jβˆ₯F\|J\|_F denotes the Frobenius norm of the interaction matrix of the Ising model. This simultaneously subsumes both the breakthrough work of Basak and Mukherjee, who showed the tight result that the mean-field approximation is within o(n)o(n) whenever βˆ₯Jβˆ₯F=o(n)\|J\|_{F} = o(\sqrt{n}), as well as the work of Jain, Koehler, and Mossel, who gave the previously best known non-asymptotic bound of O((nβˆ₯Jβˆ₯F)2/3log⁑1/3(nβˆ₯Jβˆ₯F))O((n\|J\|_{F})^{2/3}\log^{1/3}(n\|J\|_{F})). We give a simple, algorithmic proof of this result using a convex relaxation proposed by Risteski based on the Sherali-Adams hierarchy, automatically giving sub-exponential time approximation schemes for the free energy in this entire regime. Our algorithmic result is tight under Gap-ETH. We furthermore combine our techniques with spin glass theory to prove (in a strong sense) the optimality of correlation rounding, refuting a recent conjecture of Allen, O'Donnell, and Zhou. Finally, we give the tight generalization of all of these results to kk-MRFs, capturing as a special case previous work on approximating MAX-kk-CSP.Comment: This version: minor formatting changes, added grant acknowledgement

    Learning Some Popular Gaussian Graphical Models without Condition Number Bounds

    Full text link
    Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.Comment: V2: Updated version with some new result

    Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation

    Full text link
    The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics. We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is statistically much weaker than Belief Propagation for the reconstruction problem. More formally, any recursive algorithm with bounded memory for the reconstruction problem on the trees with the binary symmetric channel has a phase transition strictly below the Belief Propagation threshold, also known as the Kesten-Stigum bound. The proof combines in novel fashion tools from recursive reconstruction, information theory, and optimal transport, and also establishes an asymptotic normality result for BP and other message-passing algorithms near the critical threshold.Comment: To be presented on COLT 201

    Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis

    Full text link
    There has been a large amount of interest, both in the past and particularly recently, into the power of different families of universal approximators, e.g. ReLU networks, polynomials, rational functions. However, current research has focused almost exclusively on understanding this problem in a worst-case setting, e.g. bounding the error of the best infinity-norm approximation in a box. In this setting a high-degree polynomial is required to even approximate a single ReLU. However, in real applications with high dimensional data we expect it is only important to approximate the desired function well on certain relevant parts of its domain. With this motivation, we analyze the ability of neural networks and polynomial kernels of bounded degree to achieve good statistical performance on a simple, natural inference problem with sparse latent structure. We give almost-tight bounds on the performance of both neural networks and low degree polynomials for this problem. Our bounds for polynomials involve new techniques which may be of independent interest and show major qualitative differences with what is known in the worst-case setting

    A Phase Transition in Arrow's Theorem

    Full text link
    Arrow's Theorem concerns a fundamental problem in social choice theory: given the individual preferences of members of a group, how can they be aggregated to form rational group preferences? Arrow showed that in an election between three or more candidates, there are situations where any voting rule satisfying a small list of natural "fairness" axioms must produce an apparently irrational intransitive outcome. Furthermore, quantitative versions of Arrow's Theorem in the literature show that when voters choose rankings in an i.i.d.\ fashion, the outcome is intransitive with non-negligible probability. It is natural to ask if such a quantitative version of Arrow's Theorem holds for non-i.i.d.\ models. To answer this question, we study Arrow's Theorem under a natural non-i.i.d.\ model of voters inspired by canonical models in statistical physics; indeed, a version of this model was previously introduced by Raffaelli and Marsili in the physics literature. This model has a parameter, temperature, that prescribes the correlation between different voters. We show that the behavior of Arrow's Theorem in this model undergoes a striking phase transition: in the entire high temperature regime of the model, a Quantitative Arrow's Theorem holds showing that the probability of paradox for any voting rule satisfying the axioms is non-negligible; this is tight because the probability of paradox under pairwise majority goes to zero when approaching the critical temperature, and becomes exponentially small in the number of voters beyond it. We prove this occurs in another natural model of correlated voters and conjecture this phenomena is quite general.Comment: 48 pages; comments welcome

    A Spectral Condition for Spectral Gap: Fast Mixing in High-Temperature Ising Models

    Full text link
    We prove that Ising models on the hypercube with general quadratic interactions satisfy a Poincar\'{e} inequality with respect to the natural Dirichlet form corresponding to Glauber dynamics, as soon as the operator norm of the interaction matrix is smaller than 11. The inequality implies a control on the mixing time of the Glauber dynamics. Our techniques rely on a localization procedure which establishes a structural result, stating that Ising measures may be decomposed into a mixture of measures with quadratic potentials of rank one, and provides a framework for proving concentration bounds for high temperature Ising models.Comment: Preliminary versio
    • …
    corecore